Evaluation of gene-finding programs on mammalian sequences.

نویسندگان

  • S Rogic
  • A K Mackworth
  • F B Ouellette
چکیده

We present an independent comparative analysis of seven recently developed gene-finding programs: FGENES, GeneMark.hmm, Genie, Genescan, HMMgene, Morgan, and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and biologically validated dataset of mammalian genomic sequences that does not overlap with the training sets of the programs analyzed. Our analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies. The accuracy of the programs was also examined as a function of various sequence and prediction features, such as G + C content of the sequence, length and type of exons, signal type, and score of the exon prediction. This approach pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. The dataset used in this analysis (HMR195) as well as the tables with the complete results are available at http://www.cs.ubc.ca/~rogic/evaluation/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mitochondrial DNA characterization of Sergentomyia sintoni populations and finding mammalian Leishmania infections in this sandfly by using ITS-rDNA gene

Sergentomyia sintoni is the natural vector of Sauroleishmania species of lizards. This sandfly isabundance in and around the burrows of great gerbils. S. sintoni was collected from peridomestic animalshelters, inside and around houses and also from the nearby burrows of the gerbil reservoir hosts,Rhombomys opimus, in several provinces of Iran. Mitochondrial Cytochrome b (Cyt b) of sandflies, wh...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

Evaluating and Improving the Accuracy of Computational Gene-Finding on Mammalian DNA Sequences

This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequen...

متن کامل

In silico cloning and bioinformatics study of Brucella melitensis Omp31 antigen in different mammalian expression vectors

Brucella melitensis, as a pathogenic gram-negative intracellular bacterium, causes brucellosis in animals and humans. According to literature, the B. melitensis outer membrane protein 31 (Omp31) is considered as an important vaccine candidate against brucellosis. The aim of the current study was to compare three different expression constructs containing B. melitensis Omp31 antigen using bioinf...

متن کامل

Bioinformatics Study and Investigation of the Expression Pattern of Several Important Genes Involved in Glycyrrhizin Synthesis of Glycyrrhiza glabra L. in Autumn and Spring Seasons

Glycyrrhiza is one of the important medicinal plants that is in danger of extinction. Search for finding accessions that have a higher glycyrrhizic acid is very important in breeding programs. Functional genomics methods such as EST sequencing prepare the ability to identify consensus gene families among studied species and interpretation of the genome. In this research, 55960 EST sequences of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 11 5  شماره 

صفحات  -

تاریخ انتشار 2001